Best of LessWrong 2022

Nate Soares reviews a dozen plans and proposals for making AI go well. He finds that almost none of them grapple with what he considers the core problem - capabilities will suddenly generalize way past training, but alignment won't.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
evhub8029
12
COI: I work at Anthropic and I ran this by Anthropic before posting, but all views are exclusively my own. I got a question about Anthropic's partnership with Palantir using Claude for U.S. government intelligence analysis and whether I support it and think it's reasonable, so I figured I would just write a shortform here with my thoughts. First, I can say that Anthropic has been extremely forthright about this internally, and it didn't come as a surprise to me at all. Second, my personal take would be that I think it's actually good that Anthropic is doing this. If you take catastrophic risks from AI seriously, the U.S. government is an extremely important actor to engage with, and trying to just block the U.S. government out of using AI is not a viable strategy. I do think there are some lines that you'd want to think about very carefully before considering crossing, but using Claude for intelligence analysis seems definitely fine to me. Ezra Klein has a great article on "The Problem With Everything-Bagel Liberalism" and I sometimes worry about Everything-Bagel AI Safety where e.g. it's not enough to just focus on catastrophic risks, you also have to prevent any way that the government could possibly misuse your models. I think it's important to keep your eye on the ball and not become too susceptible to an Everything-Bagel failure mode.
How to prepare for the coming Taiwan Crisis? Should one short TSMC? Dig a nuclear cellar? Metaculus gives a 25% of a fullscale invasion of Taiwan within 10 years and a 50% chance of a blockade. It gives a 65% chance that if China invades Taiwan before 2035 the US will respond with military force.  Metaculus has very strong calibration scores (apparently better than prediction markets). I am inclined to take these numbers as the best guess we currently have of the situation.  Is there any way to act on this information?
It seems the pro-Trump Polymarket whale may have had a real edge after all. Wall Street Journal reports (paywalled link, screenshot) that he’s a former professional trader, who commissioned his own polls from a major polling firm using an alternate methodology—the neighbor method, i.e. asking respondents who they expect their neighbors will vote for—he thought would be less biased by preference falsification. I didn't bet against him, though I strongly considered it; feeling glad this morning that I didn't.
lc20
0
Sometimes people say "before we colonize Mars, we have to be able to colonize Antarctica first". What are the actual obstacles to doing that? Is there any future tech somewhere down the tree that could fix its climate, etc.?
quila3410
2
nothing short of death can stop me from trying to do good. the world could destroy or corrupt EA, but i'd remain an altruist. it could imprison me, but i'd stay focused on alignment, as long as i could communicate to at least one on the outside. even if it tried to kill me, i'd continue in the paths through time where i survived.

Popular Comments

Recent Discussion

  1. Altruism is truly selfless, and it’s good.
  2. Altruism is truly selfless, and it’s bad.
  3. Altruism is enlightened self-interest, which is good.
  4. Altruism is disguised/corrupted/decadent self-interest, which is bad.

To illustrate further, though at the risk of oversimplifying…

One exponent of option #1 would be Auguste Comte who thought that living for others was the foundation of true morality and of the best society.[1]

An exponent of option #2 would be Ayn Rand, who thought that altruism was indeed a doctrine of selflessness, but that this was the antithesis of true morality, and a threat to people.[2]

An exponent of option #3 would be Pierre Cérésole, who felt that altruism is what results when you refine your self-interest successfully and rid it of its mistakes.[3]

An exponent of option #4 would be Nietzsche, who thought altruism...

Answer by StartAtTheEnd10

I don't think any one option is precise enough that it's correct on its own, so I will have to say "5" as well.

Here's my take:

  • Altruism can be a result of both good and bad mental states.
  • Helping others tends to be good for them, at least temporarily.
  • Helping people can prevent them from helping themselves, and from growing.
  • Helping something exist which wouldn't exist without your help is to get in the way of natural selection, which over time can result in many groups who are a net negative for society in that they require more than they provide. They might
... (read more)
1Answer by Tapatakt
Sometimes altruism is truly selfless (if we don't use too broad tautological definition of self-interest). Sometimes altruism is actually an enlightened/disguised/corrupted/decadent self-interest. I feel like there is some sense in which first kind is better then second, but can we have more of whatever kind, please?
4Answer by Ustice
All of these and more? I think it’s a trap to make absolute statements about things like altruism. I think that there are good people that give for good reasons, and good people that give for questionable reasons. Helping others seems to generally be positive, but there are limits. Some people give to manipulate others. True selfless is an impossibility, and therefore toothless boogieman. Altruism is complicated. In order to really judge the nature of altruism, we would have to be able to attribute an outcome based on an action, in light of all alternatives. That’s currently impossible, but we can look for trends, and develop models around them. We’d also have to truly understand the complex nature of intention, and that’s practically impossible to do for one’s self, let alone orders. Rarely is there a singular reason for anyone to do anything. Even if you go with just the most likely reason, you’re losing data. It’s totally fine to look at it from the lenses you describe (and more!), but it’s important to remember that these lenses only show you a part of a very complicated whole. Moreover, many of the distinctions between them are disagreements of definitions. My personal take is that altruism is generally a virtue, but not an obligation. It’s hard to know the line between helping unhealthily enabling, but it does have to be taken into consideration. It is important to remember that not all altruists are virtuous. In the absence of proof, be kind.

Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median researchers. We’ll call this the “median researcher problem”.

Prototypical example: imagine a scientific field in which the large majority of practitioners have a very poor understanding of statistics, p-hacking, etc. Then lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc. Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it’s mostly the median researchers who spread the memes.

(Defending that claim isn’t really the main focus of this post, but a couple pieces of legible evidence which are weakly in favor:

...
3drozdj
Is your concern simply in the 'median' researchers unfamiliarity with basic statistics? or the other variables that typically accompany a researcher without basic statistics knowledge? On a different note,  Due to your influence/status in the field, I think it would be awesome for you to set a clear-cut resource detailing what you would like to see the 'median' researchers do that they are not (other than the obvious frustration regarding statistics incompetence stated above).   

I don't think statistics incompetence is the One Main Thing, it's just an example which I expect to be relatively obvious and legible to readers here.

1PoignardAzur
My default assumption for any story that ends with "And this is why our ingroup is smarter than everyone else and people outside won't recognize our genius" is that the story is self-serving nonsense, and this article isn't giving me any reason to think otherwise. A "userbase with probably-high intelligence, community norms about statistics and avoiding predictable stupidity" describes a lot of academia. And academia has a higher barrier to entry than "taking the time to write some blog articles". The average lesswrong user doesn't need to run an RCT before airing their latest pet theory for the community, so why would it be so much better at selectively spreading true memes than academia is? I would need a much more thorough gears-level model of memetic spread of ideas, one with actual falsifiable claims (you know, like when people do science) before I could accept the idea of LessWrong as some kind of genius incubator.
2quetzal_rainbow
I agree that lab leaders are not in much better position, I just think that lab leaders causally screen off influence of subordinates, while incentives in the system causally screens off lab leaders.
1111 110th Avenue Northeast, Bellevue

Saturday of Week 47 of 2024

 

 

 

Location: 

Meeting room 5, Bellevue Library

1111 110th Ave NE, Bellevue, WA 98004'

Google Maps: https://g.co/kgs/ASXz22S

 

4hr Free Parking available in underground garage

Contact: cedar.ren@gmail.com

If you can't find us: please repeatedly call 7572794582

 

Bring boardgames

Bring questions

Might do lightning talks

2Johannes C. Mayer
I specifically am talking about solving problems that nobody knows the answer to, where you are probably even wrong about what the problem even is. I am not talking about taking notes on existing material. I am talking about documenting the process of generating knowledge. I am saying that I forget important ideas that I generated in the past, probably they are not yet so refined that they are impossible to forget.

Thank you for the clarification. Do you have a process or a methodology for when you try and solve this kind of "nobody knows" problems? Or is it one of those things where the very nature of these problems being so novel means that there is no broad method that can be applied?

A lot of threat models describing how AIs might escape our control (e.g. self-exfiltration, hacking the datacenter) start out with AIs that are acting as agents working autonomously on research tasks (especially AI R&D) in a datacenter controlled by the AI company. So I think it’s important to have a clear picture of how this kind of AI agent could work, and how it might be secured. I often talk to people who seem to have a somewhat confused picture of how this kind of agent setup would work that causes them to conflate some different versions of the threat model, and to miss some important points about which aspects of the system are easy or hard to defend.

So in this post, I’ll present a simple system architecture...

RohanS10

I've built very basic agents where (if I'm understanding correctly) my laptop is the Scaffold Server and there is no separate Execution Server; the agent executes Python code and bash commands locally. You mention that it seems bizarre to not set up a separate Execution Server (at least for more sophisticated agents) because the agent can break things on the Scaffold Server. But I'm inclined to think there may also be advantages to this for capabilities: namely, an agent can discover while performing a task that it would benefit from having tools that it d... (read more)

Summary

Four months after my post 'LLM Generality is a Timeline Crux', new research on o1-preview should update us significantly toward LLMs being capable of general reasoning, and hence of scaling straight to AGI, and shorten our timeline estimates.

Summary of previous post

In June of 2024, I wrote a post, 'LLM Generality is a Timeline Crux', in which I argue that 

  1. LLMs seem on their face to be improving rapidly at reasoning.
  2. But there are some interesting exceptions where they still fail much more badly than one would expect given the rest of their capabilities, having to do with general reasoning. Some argue based on these exceptions that much of their apparent reasoning capability is much shallower than it appears, and that we're being fooled by having trouble internalizing just how
...

I like trying to define general reasoning; I also don't have a good definition. I think it's tricky.The ability to do deduction, induction, and abduction.

 

  • The ability to do deduction, induction, and abduction.

I think you've got to define how well it does each of these. As you noted on that very difficult math benchmark comment, saying they can do general reasoning doesn't mean doing it infinitely well.

  • The ability to do those in a careful, step by step way, with almost no errors (other than the errors that are inherent to induction and abduction on lim
... (read more)
2Seth Herd
I totally agree. Natural language datasets do have the right information embedded in them; it's just obscured by a lot of other stuff. Compute alone might be enough to bring it out. Part of my original hypothesis was that even a small improvement in the base model might be enough to make scaffolded System 2 type thinking very effective. It's hard to guess when a system could get past the threshold of having more thinking work better, like it does for humans (with diminishing returns). It could come frome a small improvement in the scaffolding, or a small improvement in memory systems, or even from better feedback from outside sources (e.g., using web searches and better distinguishing good from bad information). All of those factors are critical in human thinking, and our abilities are clearly a nonlinear product of separate cognitive capacities. That's why I expect improvements in any or all of those dimensions to eventually lead to human-plus fluid intelligence. And since efforts are underway on each of those dimensions, I'd guess we see that level sooner than later. Two years is my median guess for human level reasoning on most problems, maybe all. But we might still not have good online learning allowing, for a relevant instance, for the system to be trained on any arbitrary job and to then do it competently. Fortunately I expect it to scale past human level at a relatively slow pace from there, giving us a few more years to get our shit together once we're staring roughly human-equivalent agents in the face and so start to take the potentials seriously.
7eggsyntax
Thanks! It seems unsurprising to me that there are benchmarks o1-preview is bad at. I don't mean to suggest that it can do general reasoning in a highly consistent and correct way on arbitrarily hard problems[1]; I expect that it still has the same sorts of reliability issues as other LLMs (though probably less often), and some of the same difficulty building and using internal models without inconsistency, and that there are also individual reasoning steps that are beyond its ability. My only claim here is that o1-preview knocks down the best evidence that I knew of that LLMs can't do general reasoning at all on novel problems. I think that to many people that claim may just look obvious; of course LLMs are doing some degree of general reasoning. But the evidence against was strong enough that there was a reasonable possibility that what looked like general reasoning was still relatively shallow inference over a vast knowledge base. Not the full stochastic parrot view, but the claim that LLMs are much less general than they naively seem. It's fascinatingly difficult to come up with unambiguous evidence that LLMs are doing true general reasoning! I hope that my upcoming project on whether LLMs can do scientific research on toy novel domains can help provide that evidence. It'll be interesting to see how many skeptics are convinced by that project or by the evidence shown in this post, and how many maintain their skepticism. 1. ^ And I don't expect that you hold that view either; your comment just inspired some clarification on my part.
2Noosphere89
I definitely agree with that. To put this in a little perspective, o1 does give the most consistent performance so far, and arguably the strongest in a fair competition: https://x.com/MatthewJBar/status/1855002593115939302
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

When rereading [0 and 1 Are Not Probabilities], I thought: can we ever specify our amount of information in infinite domains, perhaps with something resembling hyperreals?

  1. An uniformly random rational number from  is taken. There's an infinite number of options meaning that prior probabilities are all zero (), so we need infinite amount of evidence to single out any number.
    (It's worth noting that we have codes that can encode any specific rational number with a finite word - for instance, first apply bijection of rationals to nat
... (read more)

Epistemic status: splitting hairs. Originally published as a shortform; thanks @Arjun Panickssery for telling me to publish this as a full post.

There’s been a lot of recent work on memory. This is great, but popular communication of that progress consistently mixes up active recall and spaced repetition. That consistently bugged me — hence this piece.

If you already have a good understanding of active recall and spaced repetition, skim sections I and II, then skip to section III.

Note: this piece doesn’t meticulously cite sources, and will probably be slightly out of date in a few years. I link to some great posts that have far more technical substance at the end, if you’re interested in learning more & actually reading the literature.

I. Active Recall

When you want to learn...

1npostavs
I'm confused, isn't saying their name in a sentence an example of active recall?

hmm, that's fair — i guess there's another, finer distinction here between "active recall" and chaining the mental motion of recalling of something to some triggering mental motion. i usually think of "active recall" as the process of:

  • mental-state-1
  • ~stuff going on in your brain~
  • mental-state-2

over time, you build up an association between mental-state-1 and mental-state-2. doing this with active recall looks like being shown something that automatically triggers mental-state-1, then being forced to actively recall mental-state-2.

with names/faces, i think th... (read more)

Note: thank you Mark Budde, cofounder and CEO of Plasmidsaurus, and Maria Konovalova, a growth/marketing/talented-person at Plasmidsaurus, for talking to me for this article! Also thank you to Eryney Marrogi, who helped answer some of my dumb questions about plasmids.

Introduction

Here’s some important context for this essay: it really, really sucks to start a company in biology.

Despite billions in funding, the brightest minds the world has to offer, and clear market need, creating an enduring company here feels almost impossible. Some of this has to do with the difficulties of engaging with the world of atoms, some of it has to do with the modern state of enormously expensive clinical trials, and some of it still can be blamed on something else. To some degree, this is an unavoidable...

1alexey
and seem to be a bit contradictory?

Yeah I can see that; I guess what I was trying to get across was that Plasimidsaurus did do a lot of cold reachout at the start (and, when they did do it, it was high-effort, thoughtful reachouts that took into account the labs needs), but largely stopped afterwords.